Anthropic replicated AI goal misalignment: 12% of models sabotaged code, 50% feigned alignment after learning identity hacks, creating self-reinforcing cheating loops via fine-tuning and prompt modifications.....
Anthropic's research has for the first time confirmed that AI training may inadvertently develop models with misaligned goals, meaning the AI's objectives are inconsistent with human intentions, which could lead to destructive consequences. The study induced models to learn cheating through two methods: fine-tuning (re-training with a large number of cheating documents) and carefully designed training processes.
Weibo launches the open-source large model Vibe Thinker, which has only 1.5 billion parameters but outperforms the 671 billion parameter DeepSeek R1 in mathematical competition benchmarks, with higher accuracy and a training cost of only $7,800. It adopts a lightweight MoE architecture and knowledge distillation technology, requiring only 5GB of mathematical corpus for fine-tuning. It supports downloading from Hugging Face and commercial use. The model performs outstandingly in international math competitions such as AIME.
Runway launches a video model fine-tuning tool for partners to customize AI models in verticals like robotics and education, enhancing performance with less data and computation.....
Isahit is a platform for managing staff, focusing on LLM fine-tuning and data processing to ensure the high quality and unbiased nature of AI agents.
An open-source platform for AI model fine-tuning and monetization, empowering AI startups, machine learning engineers, and researchers.
A platform for fine-tuning AI intelligent agents.
AI model fine-tuning, personalized customization.
mistral
$2.88
Input tokens/M
$14.4
Output tokens/M
256k
Context Length
prithivMLmods
Olmo-3-7B-Instruct-AIO-GGUF is a GGUF quantized version based on the Olmo-3-7B-Instruct model developed by the Allen Institute for AI. This is an autoregressive language model with 7 billion parameters, trained on datasets such as Tulu 2 and UltraFeedback through supervised fine-tuning and direct preference optimization, and performs excellently in question-answering and instruction following.
allenai
Olmo 3 is a series of language models developed by the Allen Institute for AI, including two scales of 7B and 32B, with two variants: instructional and reflective. This model performs excellently in long-chain thinking and can effectively improve the performance of reasoning tasks such as mathematics and coding. It adopts a multi-stage training method, including supervised fine-tuning, direct preference optimization, and reinforcement learning with verifiable rewards.
Olmo-3-7B-Think-DPO is a 7B parameter language model developed by the Allen Institute for AI. It has the ability of long-chain thinking and performs excellently in reasoning tasks such as mathematics and coding. This model has undergone multi-stage training including supervised fine-tuning, direct preference optimization, and reinforcement learning based on verifiable rewards, and is designed specifically for research and educational purposes.
SadraCoding
SDXL-Deepfake-Detector is a tool for accurately detecting AI-generated faces. It focuses on maintaining the authenticity of the digital world and provides a privacy-protected and open-source solution to combat visual misinformation. This model achieves lightweight and highly accurate detection through fine-tuning a pre-trained model.
ethicalabs
xLSTM-7b-Instruct is an experimental fine-tuned version based on NX-AI/xLSTM-7b, specifically optimized for instruction-following tasks. This model adds support for chat templates and uses TRL for supervised fine-tuning training, aiming to provide a better conversational interaction experience.
Tesslate
WEBGEN DEVSTRAL IMAGES is an AI model focused on web page generation. It can generate single-page web pages using HTML, CSS, JS, and Tailwind technologies. This project is trained based on custom templates and uses the supervised fine-tuning method, training with a dataset generated by GPT-OSS-120B.
EpistemeAI
This model is based on GPT-OSS-20B and fine-tuned using the Unsloth reinforcement learning framework. The aim is to optimize inference efficiency and reduce vulnerabilities that occur during reinforcement learning from human feedback (RLHF) training. The fine-tuning process focuses on the robustness and efficiency of alignment, ensuring that the model maintains inference depth without incurring excessive computational overhead.
trinty2535425
This is an image-to-video LoRA model trained based on the Qwen/Qwen-Image base model. It uses the LoRA (Low-Rank Adaptation) technology to achieve efficient fine-tuning and can be used for related tasks such as AI image generation.
facebook
DINOv3 is a series of general visual foundation models developed by Meta AI. Without fine-tuning, it can outperform specialized state-of-the-art models in a wide range of visual tasks. This model uses self-supervised learning to generate high-quality dense features and performs excellently in various tasks such as image classification, segmentation, and depth estimation.
DINOv3 is a versatile visual foundation model developed by Meta AI. It can outperform specialized models in a wide range of visual tasks without fine-tuning. This model can generate high-quality dense features and performs excellently in various visual tasks, significantly surpassing previous self-supervised and weakly supervised foundation models.
DINOv3 is a series of general visual foundation models developed by Meta AI. It can outperform specialized advanced models in various visual tasks without fine-tuning. The model adopts the Vision Transformer architecture and is pre-trained on 1.689 billion web images. It can generate high-quality dense features and performs excellently in tasks such as image classification, segmentation, and retrieval.
danielkty22
TARS-SFT-7B is a security reasoning model based on supervised fine-tuning. It serves as the basic model for reinforcement learning training and is specifically designed to enhance the security of AI systems. This model starts training from Qwen2.5-7B-Instruct and uses the reasoning process as an adaptive defense mechanism to improve the security performance of the model.
OLMo 2 1B is a post-training variant of the allenai/OLMo-2-0425-1B-RLVR1 model, undergoing supervised fine-tuning, DPO training, and RLVR training, aiming to achieve state-of-the-art performance across multiple tasks.
ritvik77
A medical diagnosis AI model optimized through LoRA fine-tuning and 4-bit quantization technology based on the Mistral-7B language model, focusing on symptom analysis and disease diagnosis assistance.
us4
Fin-LLaMA 3.1 8B is a large language model fine-tuned specifically for financial news data based on the LLaMA 3.1 architecture. This model uses the Unsloth library for efficient fine-tuning, adopts LoRA adapter technology, and provides multiple quantized GGUF formats, aiming to generate coherent and relevant financial, economic, and business text responses.
cypienai
Cymist2-v0.1 is an advanced language model developed by Cypien AI team, specifically optimized for Turkish and English text generation tasks, supporting Retrieval-Augmented Generation (RAG) and Supervised Fine-Tuning (SFT).
KnutJaegersberg
The gpt2-chatbot is a dialogue model obtained through supervised fine-tuning (SFT) on the Deita dataset based on the GPT2-XL architecture, aiming to change a certain decision tendency. This model supports multi-turn dialogue and performs well in text generation tasks, but has limitations in mathematical reasoning.
lightblue
A chatbot model fine-tuned based on ai21labs/Jamba-v0.1, supporting multilingual dialogue. After several hours of QLoRA fine-tuning, it can conduct reasonably fluent conversations in English and other languages.
NingLab
eCeLLM-M is a large language model for the e-commerce domain, obtained through instruction fine-tuning based on the Mistral-7B Instruct-v0.2 foundation model, aiming to enhance language understanding and generation capabilities in e-commerce scenarios.
neovalle
An ecological perception model based on DPO fine-tuning technology, aiming to enhance the perception ability of ecological problems and provide support for ecological research and sustainable development.
The project involves documentation, sample code repositories, and community resources for the LangChain framework, including technical content such as Python programming, AI agent development, FastAPI integration, and LLM fine-tuning.